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1 

Routing document identifiers 



PCT/US98/20596 



The present invention relates to functionalities of computers and computer 
networks, and more particularly to the routing of document identifiers across such 
networks. 

The invention addresses problems arising in the implementation of 

techniques described in GB patent application 98 (applicants' ref. 

R/98003/JDR) based on substrates, e.g. paper, or documents produced therefrom, 
which include visible or invisible coded markings identifying the substrate and 
preferably locations or zones within it. This marking scheme in turn preferably uses 
Xerox DataGlyphs. Such substrates are referred to hereinafter as "coded substrates". 

Each physical coded substrate contains a pid-code (pid stands for page- 
identifier) which identifies it uniquely world-wide and permits to locate the "digital- 
page" coupled with this physical page, which can sit anywhere on the global network. 
This pid is encoded in DataGlyphs (visibly or invisibly) on the surface of the page in 
such a way that a "pointer" equipped with a small camera can recover the pid by 
looking at a small circular area of radius r, anywhere on the page. With the coded 
substrates, the space needed for encoding on a physical page the net address of the 
digital-page counterpart is at a high premium. 

Because the pid must be recovered from a small area, it is important to 
ensure that two conditions are met: 

1. The pid is encoded using a small number of bits. 

2. The pid can address unambiguously any of a large number of digital- 
pages. 

In order to respect these two conditions, the theoretically optimal scheme is 
to use a small number of bits for the pid, say 64, and to use the pid to address 2 A 64 
different digital-pages. (To give an idea of how big this figure is: if every inhabitant of 
the Earth was to produce 80 thousands sheets of Intelligent Paper a day for the next 
century, a 64 bits pid would be sufficient to uniquely identify all the digital-pages 
needed.) 

Obviously, with such a scheme, there must be a way to map the pid 
recovered by the pointer into the net address of the corresponding digital page. The 
solution that is envisaged in the abovementioned patent application (ref 
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R/98003/JDR) is to use a central router. This router contains a table of pairs (pid, 
address). The pointer sends a pid to the router and gets back the address to which it 
can then connect to retrieve the relevant digital-page. 

This centralised routing scheme has two problems, though. First, it may 
require huge tables for storing the (pid, address) pairs. Secondly, the number of 
requests per day to the router can be very large. 

There is a need for techniques for implementing a workable routing system, 
without incurring unrealistic address-storage and traffic-frequency costs at the central 
router's site. 

The present invention provides a method carried out in a data processing 
device comprising receiving a data set, the data set including a page identification 
code, sending the page identification code to a router, and receiving back from the 
router a network address, the network address being the address of a server associated 
with said page identification code. 

The invention further provides a method carried out in a data processing 
system, comprising: receiving a data set from a remote device via network, the data 
set defining a page identification code, using association data, the association data 
defining a mapping between a plurality of page identification codes and a plurality of 
network addresses, determining a network address associated with the received page 
identification code, and transmitting the network address to the remote device. 

The invention further provides a programmable data processing device when 
suitably programmed for carrying out the methods as described above. 

The invention has the advantage of requiring smaller and more efficient 
routing facilities, and which capitalises on the tendency of publishers (responsible for 
printing on coded substrates) to buy coded substrates in bulk. The scheme has also the 
advantage that no address space is lost (64 bits of data on the paper still allow to 
address 2 A 64 digital pages). 

For the purpose of illustration it is assumed that the company producing 
(printing) the coded substrates, providing the central address routing and selling the 
pointers is one and the same company, which we will call company X. It is also 
assumed that this company sells the sheets, not directly to end users, but to publishers 
(for instance book or journal publishers) who sell the printed pages (i.e. after printing 
with human-readable information, such as the text of an article) to the end-users. 
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(One centralised router is assumed for simplicity. A variant where several 
mirror sites for the router is obviously compatible with the invention. More 
specifically: In the non-variant, there is one central router whose address is a priori 
stored in each pointer. Requests are sent to this router by the pointer. 

In the variant, there are several (let's say 10 for example) mirror routers that 
each contain the same information as the single router in the non- variant. They are 
just copies of this router, but are located at different geographical locations. The 
addresses of these 10 routers are a priori known to the pointers. When the pointer 
needs to make a request to the router, it is indifferent which router copy it chooses to 
send this request. The pointer chooses this router randomly among the 10 it knows. If 
all pointers do the same, this has the net effect of dividing by 10 the number of 
requests that each router has to answer on average each day. This has also the effect of 
making a more efficient use of the network communication channels. ) 

Further advantages of the invention are that it allows a small number of bits 
encoded on the page to address a large number of digital pages, that it minimises hits 
on central router, and no redundancy in the address codes is needed (compare 
hierarchical internet addresses). 

It will be appreciated that the techniques described herein may also be used 
in conjunction with the techniques described in GB patent applications 

98 (applicants' ref R/98003/JDR) and 98 (applicants* 

ref.R/98004/JDR), filed concurrently herewith. 

Embodiments of the invention will now be described, by way of example, 
with reference to the accompanying drawings, in which: 

Figure 1 illustrates the components of a pointed document as printed on a 
coded substrate; 

Figure 2 shows a sample of zones, and the disposition of machine readable 
data, on a coded substrate; 

Figures 3 and 4 show how digital data is encoded in the zones illustrated in 

Fig. 2; 

Figure 5 schematically illustrates an embodiment of a pointer which may be 
used in implementing the invention; 
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Figure 6 shows a configuration for passing page identification codes and/or 
page location codes from the pointer of Fig. 5 to a network computer, in accordance 
with an embodiment of the invention; 

Figure 7 illustrates the assignment of page-id groups and the association of 
page-ids and server addresses in an embodiment employing centralised routing; 

Figure 8 shows the retrieval of the internal address for a first page using a 
first page-id; 

Figure 9 shows the retrieval of the internal address for a second page using a 
second page-id; and 

Figure 10 is a flow chart of the processing steps in implementing the retrieval 
scheme of Fig. 8. 

Figure 1 illustrates the components of a pointed document as printed on a 
coded substrate- The printed document 102 comprises a layer 104 of printed visible 
(human-readable) information printed on a coded substrate 1 06. The coded substrate 
106 in turn comprises a layer 108 of visible or invisible machine readable markings 
printed on a sheet medium 110 (e.g. paper). 

Figure 2 shows a sample of zones, and the disposition of machine readable 
data, on a coded substrate. Each zone or cell 202 includes a border 204 and an 
orientation marker 206. A first set of markings 208 over part of the interior of the cell 
202 are encoded representations of the page-id, while a second set of markings 210 
over a (smaller) part of the interior of the cell 202 are encoded representations of the 
localisation (page-loc) - uniquely defining the position of the cell 202 within the page. 

Figures 3 and 4 show how digital data is encoded in the zones illustrated in 
Fig. 2. Fig. 3 shows the binary data, i.e. 47 bits of page-id in the upper section 302 
(the bit stream wraps at the cell border 204), and 16 its of page localisation data (loc) 
in the lower section 304). The page-id code denotes 108429159095492 = 
629DA182DCC4 (hexadecimal) = 

110001010011101 101000011000001011011100110001000001000000010101 
(binary; to make the wrapping explicit). In the 16 bit loc code in section 304, there are 
8 bits for the X co-ordinate and 8 bits for the Y co-ordinate. Thus, for the cell (zone) 
shown, its position is 16,21 on the substrate. 

Fig. 4 shows the same data as in Fig. 3, but represented by Data Glyph 
markings. Encoding using data glyphs and the retrieval of data therefrom is discussed 

SUBSTITUTE SHEET (RULE 26) 



WO 99/50751 5 PCT/US98/20596 

further in US-A-5,486,686, EP-A-469864, and the abovementioned GB application 
(ref.R/98003/JDR). Here, there is a first set of glyphs (markings) in upper section 402 
and a second set in lower section 404, the two sets of glyphs being encoded 
representations of page-id and loc codes. 

Figure 5 schematically illustrates an embodiment of a pointer which may be 
used in implementing the invention. The pointer 502 comprises a marking device 504 
(which may be a pen or any other marking device suitable for making marks which 
are visible to a user), and an image capture device 506. In use, whether or not the user 
is making marks using the marking device 504, the image capture device 506 is able 
to capture images of an area A of a document 508, (For the sake of illustration, the 
sizes of these elements are exaggerated - e.g. in practice, the area A may be much 
closer to the tip 505 of the marking device 504 than appears), in certain 
embodiments," the marking device 504 may be omitted. 

The document 508 may be a c blank' coded substrate, or such a substrate 
having human-readable information printed thereon. 

Figure 6 shows a configuration for passing page identification codes and/or 
page location codes from the pointer of Fig. 5 to a network computer, in accordance 
with an embodiment of the invention. The image capture device (e.g. CCD camera) 
506 is coupled by wired or wireless (e.g. IR or RF) link to processing device 602 and 
in use provides image data defining capture images to the processing device 602. The 
operative elements of the processing device 602 are a frame grabber circuit 604, 
image decoding software 606, and a CPU 608, which are known in the art. (In certain 
embodiments, the camera 506 and processing device 602 may be combined into an 
integral handheld unit). In use, the processing device 602 extracts from the image 
data the corresponding page-id and page-location data (<pid, loc>) and communicates 
them in a wired or wireless fashion to a local device (here, a network computer 610, 
which is linked to the network (intranet, internet) in a known manner). The computer 
610 has its own unique network address, but need not have any information output 
device (e.g. display screen, printer). 

Figure 7 illustrates the assignment of page-id groups and the association of 
page-ids and server addresses in an embodiment employing centralised routing. This 
shows the distribution 702 of page-ids, and the groups thereof (e.g. O-a, a-b), each of 
which is encoded in one coded substrate of a batch as it is supplied to a publisher for 
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printing. A table 704 is stored at the central router computer: this stores the associated 
between groups 706 of consecutive page-ids and server addresses 708. Here addr k , is 
the internet address of the server for page-ids k through 1. 

Figure 8 shows the retrieval of the internal address for a first page using a 
first page-id. 

The pointer 502, while pointed at a coded substrate, can communicate a 
page-id i to a central router 802 and to a server 804. The protocol may be described as 
follows (this is illustrated in Fig. 10). 

1. Pointer reads page whose page-id is i where m < i < n. 

2. Pointer transmits i to central router. 

3. Central router transmits (m, n, addr^J to pointer. 

4. Triple (m, n, addr m n ) is stored in pointer. 

5. Pointer transmits i to server at addr mn . 

6. Server transmits internet address of digital page for i to pointer. 

7. Pointer interacts with digital page. 

Figure 9 shows the retrieval of the internal address for a second page using a 
second page-id. Here the pointer 502 communicates page-id j to the server 804. The 
protocol is as follows. 

1 . Pointer reads page whose page-id is j where m < j < n. 

2. Pointer transmits j to server at addr m n . 

3. Server transmits internet address of digital page for j to pointer. 

4. Pointer interacts with digital page. 

It can be seen that techniques according to the invention may be used to 
implement the following scheme. 

1. Publisher P buys p coded substrates from X. P provides the net address A 
of a server belonging to P. 

2. X has previously altogether sold m coded substrates to other publishers (or 
to X). These sheets have been given page-ids (pid's) ranging from 0 to m-1. X 
produces p new coded substrates, numbered from mton = m + p- 1. 

3. X installs number m as a key in its central router database (it is assumed 
that number m was previously installed in a similar way), and associates with this key 
the address addrm,n = A provided by P for these p pages (see Fig. 8). 
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4. At a later time, a user clicks his/her pointer at one of these p sheets for the 
first time. This results in the pid of this sheet to be sent to the central router. The 
central router returns to the pointer a record (m,n,addrm,n), where m, n, and addrm,n 
are as above. This triple is stored in the pointer's memory (see Fig. 8). (It is essential 
that pointers be co-operative in storing these triples. This behaviour can be enforced 
through a priori specification or through some "punishing" scheme for uncooperative 
pointers. Also, because the pointer has limited memory, it may have to expel a 
previously stored record. This qualification does not seriously affect the proposal.) 

5. The pointer sends the pid i*(a number comprised between m and n) to the 
address addrm,n. The final routing of the pid is now the responsibility of the 
publisher's server at addrm,n, who retrieves the digital page associated with pid (see 
Fig. 8). 

6. If/ at a later stage, the same pointer clicks on *any* page having a pid j in 
the range [m,n[, then the pointer consults its memory first, and notices that it contains 
the record (m,n,addrm,n). Rather than now consulting the central router, it consults 
directly the publisher's server at address addrm,n (see Fig. 9). 

There are two main advantages: 

1 . In the central router, only one entry needs to be stored for each (batch) 
bulk purchase of coded substrates. 

2. If a journal publisher, let's say, buys in a single purchase enough sheets for 
a year of publication, a subscriber to the journal will only access the central router 
once in a year; for the first request to the router will result in caching the whole range 
of pages for a year of the journal. Because of this tendency for users to click 
repeatedly on pages belonging to the same sheet batch, the number of per day requests 
to the central router diminishes dramatically. 

The routing fee charged by X to the publishers can promote the scheme by 
being regressive relative to the number of sheets bought in a single purchase by a 
publisher. The rationale behind this regressive cost is the fact that the routing costs 
associated to two separate purchases by the same publisher of p and then p' sheets, 
associated with two publisher servers A and A\ are higher than the routing costs 
associated with a single purchase of p+p' sheets associated with a single publisher 
server A. This difference is especially large if the fact that a pointer has "seen" the 
first batch significantly increases its probability of seeing the second one sometime in 
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the future (as compared to the a priori probability of seeing the second batch). It will 
be appreciated that these principles could be repeated recursively; the publisher's 
server itself could be organised in a way similar to the central router. 
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CLAIMS: 

1 . A method for performing routing comprising: 
receiving an item of data that indicates a page identifier; and 

5 

using the page identifier indicated by the item of data to obtain a network address. 
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